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Turing Test is subjective. It is an empirical test, not a scientific experiment. 
Langauge complexity is much less than the human intelligence complexity. 
So Turing Test is invalid. 


Sciences are different from mathematics. Scientific experiements only can 
falsify, but never prove unlimited possiblities. Scientific research is an 
ongoing process, should always open to new experiments. 


So other existing empirical tests for AI technologies, such as the regular Go 
games played by AlphaGo Zero and other computer Go systems, the 
simulations and road tests of self-driving cars, the datasets for natural 
langauge understanding, etc. are also inadequate. 


Technological Singularity is baseless. Driverless cars with no constrains (i.e. 
SAE level 5 automated-driving) are impossible. There are problems in the 
definition of SAE level 4. 


In reality, there is no way to prove a car with SAE level 4 automated-driving 
ability, especially when the mode evolution in future is not stable. So new 
concepts of AI and new definitions of automated-driving should be studied. 


In this paper, I will discuss the problems in Turing Test, the problems in 
existing testing of AlphoGo Zero, self-driving cars, natural langauge 
understanding, and the problems in the mainstream textbook AI: A Modern 
Approach. Then I will propose Gu Test, a progressive measurement of 
generic artificial intelligence, based on falsifiability, which could help to 
develop scientific intelligence theories gradually. 


1. The Problems in Turing Test 


Turing Test is invalid, but still cause misleading widely in AI research so far. 


Many existing tests for AI techonologies have similar problems as Turing 
Test. So it is important to analyze its problems and clarify the misleading. 


Turing Test is subjective. Testing it with different people could yield very 
different results. People with different knowledges, especially with different 
understanding levels of computer technologies, could give very different 
results. The subjectiveness of Turing Test cause unstable results, which 
makes Turing Test invalid. 


Moreove, language complexity is much less than human intelligence 
complexity. Humans have much more intelligence than language level 
intelligence [1]. So Turing Test is not valid by making judgement of 
intelligence based on language conversation. Indistinguishablity between 
humans and computers by language conversations does not mean 
equivalence of intelligence. 


Turing Test is also an empirical test, not a scientific experiment. 


Sciences are different from mathematics. Scientific experiements only can 
falsify, but can never prove unlimited possiblities. Actually, equivalence of 
intelligence between humans and computers can never be proved, but only 
can be falsified. 


Scientific research is an ongoing process, should always open to new 
experiments. If computers pass some tests, other people still could design 
new tests to disprove. 


Scientific experiments should be done with strictly controlled conditions, to 
test the underlying principles. Scientific conclusions can only be derived 
from these principles based on the strict conditions. From empirical tests, 
people can not derive scientific conclusion. 


Other existing tests for AI technologies have many similar problems. In the 
next sections, I will discuss the testing problems for computer Go systems, 
self-driving cars, and natural langauge understanding. 


2. AlphaGo Zero's Superhuman Claim 


3. Test Automated Driving 


4. The Problems in AI: A Modern Approach 


5. Measure Language Intelligence 


AI could do searches well and have a much better memory for text contents 
than humans. AI even could achieve many progresses in machine 
translation. However, AI does not really understand semantics. There is a 
Chinese room issue, which could be verified. 


AI could not process high-order logic properly, could not recognize sophism, 
could not recognize wrong thinking modes, such as Aristotle thinking mode. 


So replying on AI to make judgement could cause severe problems in 
juridical practice, scientific researches, education, medical practice, etc. Asking 
students to obey computer's thinking mode could damage their intelligence 
development. 


The current testing datasets for language understanding, such as SQuAD, 
CoQA, QuAC, NLVR*2, GLUE series, cannot measure the real difference between 
human and Natural Language Processing (NLP). They cannot help much on high- 
order logic processing, recognizing sophism, verify Chinese room issues, etc. 


All of these datasets fall into the traps of Aristotle thinking mode. They can 
not recognize wrong thinking modes, and are not scientific methods. 


To understand human intelligence, we need a structural and systematic 
analysis of human intelligence. I defined certain main intelligence levels: 
language level, philosophical level, mathematical level, scientific level, all 
with different requirements and criteria. 


Langauge intelligence is an important characteristic of human intelligence. 
Other known lives do not have advanced language ability. Langauge is also 
an important media for human knowledge, the basis for philosophy, 
mathematics, sciences, etc. 


Based on languages, humans developed two important branches of studies: 
mathematics and philosophy. Mathematics develops towards accuracy. 
Philosophy develops towards integrity. 


Sciences originates from philosophy, so sciences also develop towards 
integrity. More than philosophy, sciences make conclusions based on 


experiments of falsifiability with strictly controlled conditions. Beyond 
philosophy, sciences also gradually introduce accuracy and mathematics. 


Mathematics does not meet the criteria of sciences. It even does not have 
integrity [2]. 


Based on these structural and systematic studies of human intelligence, 
people could measure langauge intelligence much better. 


6. Gu Test 
7. Conclusion 
[1] In section 5., I will discuss more on different intelligence levels. 


[2] For more details, please see my article: A Structural and Systematic Analysis 
of Human Knowledge and Studies. 


